Hyper - Systolic Implementation of BLAS - 3 Routines on the APE 100 / Quadrics
نویسندگان
چکیده
Basic Linear Algebra Subroutines (BLAS-3) 1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their eecient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the APE100/Quadrics 2]) and to the adoption of the hyper-systolic method 3, 6, 4] to eeciently implement BLAS-3 on such a machine. The results we achieved (nearly 60-70% of the peak performances for large matrices) demonstrate the validity of the proposed approach. The work is structured as follows: section 1 is devoted to review BLAS-3, in section 2 we recall the hyper-systolic method, subsequently (section 3), the target machine is described and (section 4) the HS implementation is shown. Finally (section 5), some experimental results are given.
منابع مشابه
Hyper-Systolic Implementation of BLAS-3 Routines on the APE100/Quadrics Machine
Basic Linear Algebra Subroutines (BLAS-3) [Cho 92] are the building block to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system computational power. In this work we refer to a massively parallel processing SIMD machin...
متن کاملHyper-Systolic Implementation of BLAS-3 Routines in the APE100/Quadrics Machine
Basic Linear Algebra Subroutines (BLAS-3) [Cho 92] are the building block to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system computational power. In this work we refer to a massively parallel processing SIMD machin...
متن کاملHyper-systolic algorithms for N-body computations and parallel level-3 BLAS libraries
Hyper-systolic algorithms repesent a new class of parallel computing structures. Because of their regular communication and compute patterns they are well suited for implementation on most parallel architectures, in particular, high performance SIMD machines can beneet considerably. After a short explanation of the concept of hyper-systolic algorithms, their application to N-body computations a...
متن کاملBLASFEO: Basic linear algebra subroutines for embedded optimization
BLASFEO is a dense linear algebra library providing high-performance implementations of BLASand LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different impl...
متن کاملLevel and BLAS in the NAG C Library
This report describes a set of matrix vector routines Level BLAS and matrix matrix routines Level BLAS written in C These routines have been included in Mark of the NAG C Library and are used by other library routines in that library Details are given of the implementation testing and use of the routines and a complete listing of all the ANSI C function prototypes is included in the Appendix Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998